Now we’re going to look at the microbiota data, from the same Papa et al. 2012 study on paediatric IBD patients and controls.
A note on sequence data processing
These data have already been processed into a table: counts of how often each sequence occurs in each sample.
It started as huge fastq text files, full of As, Cs, Ts and Gs, but we will not practice sequence data processing today, because it takes quite a long time to run.
If you generate your data with us at Medical Microbiology, we will run the latest processing approaches on our computational infrastructure and provide you with the processed data.
Nowadays, we do amplicon sequencing with Illumina MiSeq, HiSeq, or similar technologies, and denoise the output with DADA2 to produce ASV count tables.
The example data we will use today are older. The amplicons were sequenced with “454 pyrosequencing” and clustered into OTUs, but the core principles of analysis remain the same.
And as you will see later today, the same approaches can also be applied to taxonomic abundance tables obtained from shotgun metagenomic sequencing.
First we’re going to use the R skills we practised in part 1A to inspect this data.
SOLUTION: Click me?
In this practical, solution blocks like this are available…
But try and write code independently before looking at the answer!
Attempting to recall knowledge or solve problems is proven to enhance learning.
So don’t look unless you’re really stuck! 💪
After that, we’ll take a look at some specialist R packages for microbiome analysis.
2 Learning goals 🧠
Practice general R skills to inspect microbiota count data and taxonomy
Gain basic familiarity with specialist microbiome R packages: phyloseq and microViz
Learn to calculate, visualise, and interpret standard richness and diversity indices: Observed Richness and (Effective) Shannon diversity
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
4 Read and inspect data 🔍
Read the metadata file from part 1A. We will read the RDS file "all_metadata.rds"
Rows: 91 Columns: 36350
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): sample
dbl (36349): OTU_00001, OTU_00002, OTU_00003, OTU_00004, OTU_00005, OTU_0000...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 36349 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (7): OTU, Kingdom, Phylum, Class, Order, Family, Genus
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Notice that in the phyloseq object the sample names have underscores, but they have hyphens in the sample metadata.
meta
# A tibble: 90 × 17
ID sample case_control diagnosis activity active gender ethnicity
<chr> <chr> <chr> <fct> <fct> <fct> <chr> <chr>
1 099A 099-AX Case UC severe active female White
2 199A 199-AX Control Other control control female Other
3 062B 062-BZ Case CD mild active male White
4 194A 194-AZ Case UC mild active male White
5 166A 166-AX Case UC severe active female Black
6 219A 219-AX Case UC inactive inactive female <NA>
7 132A 132-AX Case CD mild active female White
8 026A 026-AX Case UC mild active male White
9 102A 102-AZ Control Other control control male White
10 140A 140-AX Control Other control control female White
# ℹ 80 more rows
# ℹ 9 more variables: family_history <chr>, age_years <dbl>,
# immunosuppression_level <fct>, medication <chr>, ibd_fhx <lgl>, abx <lgl>,
# steroids <lgl>, imsp_other <lgl>, imsp_any <lgl>
We can fix that with the stringr package str_replace function.
# A tibble: 90 × 17
ID sample case_control diagnosis activity active gender ethnicity
<chr> <chr> <chr> <fct> <fct> <fct> <chr> <chr>
1 099A 099_AX Case UC severe active female White
2 199A 199_AX Control Other control control female Other
3 062B 062_BZ Case CD mild active male White
4 194A 194_AZ Case UC mild active male White
5 166A 166_AX Case UC severe active female Black
6 219A 219_AX Case UC inactive inactive female <NA>
7 132A 132_AX Case CD mild active female White
8 026A 026_AX Case UC mild active male White
9 102A 102_AZ Control Other control control male White
10 140A 140_AX Control Other control control female White
# ℹ 80 more rows
# ℹ 9 more variables: family_history <chr>, age_years <dbl>,
# immunosuppression_level <fct>, medication <chr>, ibd_fhx <lgl>, abx <lgl>,
# steroids <lgl>, imsp_other <lgl>, imsp_any <lgl>
Now make them row names, check they all match, and add the metadata to the phyloseq object.
# A tibble: 90 × 18
.sample_name ID sample case_control diagnosis activity active gender
<chr> <chr> <chr> <chr> <fct> <fct> <fct> <chr>
1 099_AX 099A 099_AX Case UC severe active female
2 199_AX 199A 199_AX Control Other control control female
3 062_BZ 062B 062_BZ Case CD mild active male
4 194_AZ 194A 194_AZ Case UC mild active male
5 166_AX 166A 166_AX Case UC severe active female
6 219_AX 219A 219_AX Case UC inactive inactive female
7 132_AX 132A 132_AX Case CD mild active female
8 026_AX 026A 026_AX Case UC mild active male
9 102_AZ 102A 102_AZ Control Other control control male
10 140_AX 140A 140_AX Control Other control control female
# ℹ 80 more rows
# ℹ 10 more variables: ethnicity <chr>, family_history <chr>, age_years <dbl>,
# immunosuppression_level <fct>, medication <chr>, ibd_fhx <lgl>, abx <lgl>,
# steroids <lgl>, imsp_other <lgl>, imsp_any <lgl>
# get the OTU table, or part of itotu_get(ps, taxa =1:5, samples =c("132_AX", "166_AX", "102_AZ"))
Taxonomy Table: [3 taxa by 6 taxonomic ranks]:
Kingdom Phylum Class Order
OTU_00001 "Bacteria" "Firmicutes" "Clostridia" "Clostridiales"
OTU_00002 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacteriales"
OTU_00003 "Bacteria" "Firmicutes" "Clostridia" "Clostridiales"
Family Genus
OTU_00001 "Ruminococcaceae" "Faecalibacterium"
OTU_00002 "Enterobacteriaceae" "Escherichia/Shigella"
OTU_00003 "Lachnospiraceae" "Blautia"
7 Build some bar charts 📊
Sequencing data are compositional. The total number of reads per sample is mostly arbitrary, and so the counts should be interpreted as relative abundances instead of absolute abundances.
A simple way to visualise compositional data is as percentages, proportions of a whole.
Stacked bar charts are great for this: each bar represents one sample, and we can show the abundance data as proportions, after aggregating the counts by taxonomy.
As a last task for this introductory session, we will calculate and visualise alpha diversity.
Why is diversity interesting?
Biologically
Higher diversity ecosystems are probably more resilient to perturbations
Lower gut microbiota diversity sometimes accompanies various health problems (in adults)
BUT: diverse == healthy is not TRUE for all ecosystems (e.g. early infant gut microbiome)
So, consider your own data and diversity hypotheses carefully
Practically
Diversity indices provide a simple “one number” summary of each ecosystem
This makes it relatively easy to compare samples, and do statistical testing
8.1 Observed richness
The simplest richness measure is just counting, a.k.a. “Observed Richness”.
Let’s compute the observed richness of genera.
ps_calc_richness() computes the index for each sample and adds it to your sample_data
ps_alpha<-ps%>%ps_calc_richness(rank ="Genus", index ="observed", varname ="N_genera")
8.2 Shannon diversity
Next we will calculate the Shannon diversity index at the level of Genus.
The Shannon index is a commonly used diversity measure, with this formula: \(H = -\sum_{i=1}^Np_i\ln p_i\)
Shannon index is often is denoted with \(H\), and here \(p_i\) denotes the proportional abundance of the \(i\)’th of \(N\) taxa in the sample.
Explanation of the Shannon index formula
For each taxon \(i\), you multiply its proportional abundance \(p_i\) by the natural log of that proportion \(\ln p_i\), and sum those values.
Try it out for yourself to convince yourself you get larger (negative) values for higher proportions.
The highest value you can achieve with \(N\) taxa occurs with equal proportions (e.g. with 20 taxa, maximum diversity occurs if each has a relative abundance of 5%, i.e. 0.05)
Lastly, you change the sign of the result to a positive number, for ease of interpretation (this just makes more intuitive sense: as higher positive numbers indicates higher diversity.)
We will also compute an exponentiated version, the effective number of genera, or effective Shannon index.
Explanation of the Effective Shannon Diversity
The exponent of the Shannon index \(e^H\) represents the number of taxa (genera) that would be present in an evenly abundant ecosystem with the same Shannon index.
The numeric value of the Shannon index itself has no intuitive meaning.
You can compare them, but can’t easily interpret any one number.
So, the concept of “effective numbers” of taxa is useful here.
If your original ecosystem was actually perfectly even, then \(e^H = N\)
Where N is the observed richness
The more uneven an ecosystem, the further \(e^H\) will be from \(N\)
---title: "Practical 1B - microbiota data"subtitle: "2024 Liege microbiome workshop"author: David Barnettdate: last-modifiedformat: htmlkeep-md: falsetheme: light: flatly dark: darklycss: ../../.css/instructions.cssembed-resources: truecode-block-border-left: truecode-block-bg: truetoc: truetoc-location: righttoc-depth: 4toc-expand: 1other-links: - text: "Workshop Overview" href: "https://david-barnett.github.io/2024-Liege-microbiome" icon: "house" target: "_blank" - text: "posit.cloud workspace" href: "https://posit.cloud/spaces/583413" icon: "cloud" target: "_blank"number-sections: truenumber-depth: 3fig-align: centerfig-dpi: 200fig-width: 7.5fig-height: 5fig-responsive: truecode-tools: truecode-fold: falsecode-link: truelightbox: autolink-external-icon: truecache: true---## IntroNow we're going to look at the microbiota data, from the same Papa et al. 2012 study on paediatric IBD patients and controls.::: {.callout-note collapse="true"}### A note on sequence data processingThese data have already been processed into a table: counts of how often each sequence occurs in each sample.It started as huge fastq text files, full of As, Cs, Ts and Gs, but we will not practice sequence data processing today, because it takes quite a long time to run.If you generate your data with us at Medical Microbiology, we will run the latest processing approaches on our computational infrastructure and provide you with the processed data.Nowadays, we do amplicon sequencing with Illumina MiSeq, HiSeq, or similar technologies, and denoise the output with DADA2 to produce ASV count tables.The example data we will use today are older. The amplicons were sequenced with "454 pyrosequencing" and clustered into OTUs, but the core principles of analysis remain the same.And as you will see later today, the same approaches can also be applied to taxonomic abundance tables obtained from shotgun metagenomic sequencing.:::First we're going to use the R skills we practised in part 1A to inspect this data.::: {.callout-note collapse="true" appearance="simple"}### SOLUTION: Click me?- In this practical, solution blocks like this are available...- But try and write code independently before looking at the answer!- Attempting to recall knowledge or solve problems is proven to enhance learning.- So don't look unless you're really stuck! 💪:::After that, we'll take a look at some specialist R packages for microbiome analysis.## Learning goals 🧠- Practice general R skills to inspect microbiota count data and taxonomy- Gain basic familiarity with specialist microbiome R packages: `phyloseq` and `microViz`- Learn to calculate, visualise, and interpret standard richness and diversity indices: Observed Richness and (Effective) Shannon diversity## Load R packages 📦```{r}library(readxl)library(here)library(tidyverse)```## Read and inspect data 🔍Read the metadata file from part 1A. We will read the RDS file `"all_metadata.rds"````{r}meta <-read_rds(file =here("data/papa2012/processed/all_metadata.rds"))```Read the count table: this is stored as a TSV (tab-separated variables) formatted text file.```{r}counts <-read_tsv(file =here("data/papa2012/papa2012_OTU_count_table.txt"))``````{r}counts```#### **Your first challenge:**Read the taxonomy table stored in `"data/papa2012/papa2012_taxonomy_table.txt"`Call the object `taxonomy`::: {.callout-note collapse="true" appearance="simple"}#### SOLUTION:```{r}taxonomy <-read_tsv(file =here("data/papa2012/papa2012_taxonomy_table.txt"))``````{r}taxonomy```:::### Taxonomy tableNow practice inspecting the taxonomy table by completing the following tasks:::: panel-tabset##### Task 1Check how many distinct genera there are.Tip: use `unique()` and `length()`::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r}taxonomy$Genus %>%unique() %>%length()```:::##### Task 2How many OTUs are there in each Phylum?Tip: use `count()` or `table()`::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r}taxonomy %>%count(Phylum)``````{r}taxonomy$Phylum %>%table(useNA ="ifany")```:::##### Task 3What genera are in the phylum Actinobacteria?Tip: use `filter()`::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r}taxonomy %>%filter(Phylum =="Actinobacteria") %>%count(Genus, sort =TRUE)```Or for just their names:```{r}taxonomy %>%filter(Phylum =="Actinobacteria") %>%pull(Genus) %>%unique()```::::::### OTU count tableFirst let's plot a histogram of OTU number 1.```{r, fig.width=8, fig.height=2}counts %>% ggplot(aes(OTU_00001)) + geom_histogram(bins = 50)```Looks like there are a lot of zeros!```{r}table(OTU00001_has_0_counts = counts$OTU_00001 ==0, useNA ="ifany")```------------------------------------------------------------------------Attempt the following tasks to explore further!::: panel-tabset##### Task 1Use `filter()` to plot only the non-zero entries for OTU 1.Try also using `+ scale_x_log10()` to transform the plot axis scale.::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r, fig.width=8, fig.height=2}counts %>% filter(OTU_00001 != 0) %>% ggplot(aes(OTU_00001)) + geom_histogram(bins = 30) + scale_x_log10()```:::##### Task 2Use `mutate()` to create a temporary log-transformed variable `log_OTU1` and plot its distribution.*Note that you can't do log(0), so you will need to add 1 to all values first, i.e.* `log10(OTU_00001 + 1)`::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r, fig.width=8, fig.height=2}counts %>% mutate(log_OTU1 = log10(OTU_00001 + 1)) %>% ggplot(aes(log_OTU1)) + geom_histogram(bins = 30)```:::##### Task 3Plot OTU 1 against OTU 2 as a scatter plot using `geom_point`.Remember to add 1 before log10 transformation of both variables.::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r, fig.width=4, fig.height=4}#| out-width: "50%"#| fig-align: "center"counts %>% mutate(log_OTU1 = log10(OTU_00001 + 1)) %>% mutate(log_OTU2 = log10(OTU_00002 + 1)) %>% ggplot(aes(x = log_OTU1, y = log_OTU2)) + geom_point()```:::##### Task 4But what type of bacteria do these amplicon sequences belong to?Look up the classifications of OTU 1 and OTU 2 in the taxonomy table.::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:```{r}taxonomy %>%filter(OTU %in%c("OTU_00001", "OTU_00002"))```You could also select a subset of variables to look at:```{r}taxonomy %>%filter(OTU %in%c("OTU_00001", "OTU_00002")) %>%select(OTU, Phylum, Family, Genus) ```:::##### Task 5?Now make histograms and look up the taxonomy for the next thousand OTUs...::: {.callout-note collapse="true" appearance="simple"}##### SOLUTION:Okay, that was a joke. 🤡It is possible to make a thousand plots, because R is very good at repetitive tasks.But, this would not be very useful, because you could not look at them all.In the next section of this practical, we will explore smarter ways to analyse microbiota compositions.::::::------------------------------------------------------------------------## Assemble a phyloseq 🛠️So far we have not used any R packages specialised for microbiome data.Let's do so, because it will make our next tasks a lot easier!```{r}library(phyloseq)library(microViz)```We will start by combining our three dataframes into one `phyloseq` object### OTU counts + taxonomyphyloseq uses row or column names to match OTUs across the count and taxonomy tables.In addition, the taxonomy table must contain only taxonomic ranks in descending order, and the count table must be converted to a numeric matrix.```{r}count_matrix <- counts %>%select(-sample) %>%as.matrix()rownames(count_matrix) <- counts$sample``````{r}tax_matrix <- taxonomy %>%select(!OTU) %>%as.matrix()rownames(tax_matrix) <- taxonomy$OTU``````{r}ps <-phyloseq(otu_table(count_matrix, taxa_are_rows =FALSE), tax_table(tax_matrix))``````{r}ps```### Adding sample metadataphyloseq uses row names to match samples (across the OTU count table and the sample metadata).```{r}head(sample_names(ps))```Notice that in the phyloseq object the sample names have underscores, but they have hyphens in the sample metadata.```{r}meta```We can fix that with the `stringr` package `str_replace` function.```{r}meta$sample <- meta$sample %>%str_replace(pattern ="-", replacement ="_")meta```Now make them row names, check they all match, and add the metadata to the phyloseq object.```{r}meta_df <-as.data.frame(meta)rownames(meta_df) <- meta$sampleall(rownames(meta_df) %in%sample_names(ps))``````{r}sample_data(ps) <- meta_dfps```## microViz 🦠👁️microViz provides tools for working with phyloseq objects.Let's take a look with some basic microViz functions.```{r}samdat_tbl(ps) # retrieve sample data as a tibble``````{r}# get the OTU table, or part of itotu_get(ps, taxa =1:5, samples =c("132_AX", "166_AX", "102_AZ"))``````{r}# get the taxonomy tablett_get(ps) %>%head(3)```## Build some bar charts 📊Sequencing data are compositional. The total number of reads per sample is mostly arbitrary, and so the counts should be interpreted as relative abundances instead of absolute abundances.A simple way to visualise compositional data is as percentages, proportions of a whole.Stacked bar charts are great for this: each bar represents one sample, and we can show the abundance data as proportions, after aggregating the counts by taxonomy.```{r, message=FALSE}#| fig-width: 8#| fig-height: 2ps %>% tax_fix() %>% comp_barplot("Phylum", n_taxa = 4, label = NULL)```This plot is aggregated into phyla, which is easy to read, and provides basic information.We see that most samples contain a mixture of *Firmicutes* and *Bacteroidetes*, but some appear to be dominated by *Proteobacteria* instead.::: {.callout-warning collapse="true"}### Beware Fickle Phyla!Beware, phylum names have changed recently!- *Actinobacteria* is now *Actinomycetota*- *Bacteroidetes* is now *Bacteroidota*- *Proteobacteria* is now *Pseudomonadota* (!)- *Firmicutes* is now *Bacillota* (!!)In all prior research, you will see the original names, but in coming years, the new names will be increasingly adopted.The best source for checking/searching official and alternative names is probably [LPSN at bacterio.net](https://www.bacterio.net/){target="_blank"}:::```{r, message=FALSE}#| fig-width: 8#| fig-height: 4ps %>% tax_fix() %>% comp_barplot("Genus", n_taxa = 11, merge_other = F, label = NULL)```This plot is aggregated into genera, which provides more detail, but is harder to read, and we cannot give every genus a distinct colour.We see that many samples contain a relatively large proportion of *Bacteroides* or *Faecalibacterium* but there is quite some further variation!::: {.callout-note collapse="true"}### `tax_fix()` ?**Filling gaps in the taxonomy table**- You might have noticed that the taxonomy table has some `NA` values.- This often occurs when a sequence cannot be uniquely classified at the Genus level.- The short 16S amplicon sequenced may only allow a unique classification at Family rank, or above.```{r}tt_get(ps) %>%as.data.frame() %>%filter(is.na(Genus)) %>%head(3)```- We need to fill those gaps! and we can do this with `tax_fix()`, which copies info down from a higher rank to fill the gaps.- It often works fine with default settings, but sometimes there are more complicated taxonomy table problems- To look at and fix the taxonomy table interactively, try `tax_fix_interactive(ps)` in the console- (You may need to allow popups on your browser!)::: {.callout-warning collapse="false"}### Press Stop!Running `tax_fix_interactive(ps)` will open a new web browser tab.If you don't see anything after running the command, you might need to unblock pop-ups!When you are done looking, click the **red STOP 🛑 button in the R console**!:::Let's update our `ps` phyloseq object with the default fix.```{r}ps <-tax_fix(ps)```Check the first few taxa now look correct?:::## Discover diversity 🌳🌲🌴As a last task for this introductory session, we will calculate and visualise alpha diversity.::: {.callout-note collapse="true"}### Why is diversity interesting?#### Biologically- Higher diversity ecosystems are probably more resilient to perturbations- Lower gut microbiota diversity sometimes accompanies various health problems (in adults)- BUT: `diverse == healthy` is not `TRUE` for all ecosystems (e.g. early infant gut microbiome)- So, consider your own data and diversity hypotheses carefully#### Practically- Diversity indices provide a simple "one number" summary of each ecosystem- This makes it relatively easy to compare samples, and do statistical testing:::### Observed richness- The simplest richness measure is just counting, a.k.a. "Observed Richness".- Let's compute the observed richness of genera.- `ps_calc_richness()` computes the index for each sample and adds it to your sample_data```{r}ps_alpha <- ps %>%ps_calc_richness(rank ="Genus", index ="observed", varname ="N_genera")```### Shannon diversityNext we will calculate the Shannon diversity index at the level of Genus.- The Shannon index is a commonly used diversity measure, with this formula: $H = -\sum_{i=1}^Np_i\ln p_i$- Shannon index is often is denoted with $H$, and here $p_i$ denotes the proportional abundance of the $i$'th of $N$ taxa in the sample.::: {.callout-note collapse="true"}#### Explanation of the Shannon index formula- For each taxon $i$, you multiply its proportional abundance $p_i$ by the natural log of that proportion $\ln p_i$, and sum those values.- Try it out for yourself to convince yourself you get larger (negative) values for higher proportions.- The highest value you can achieve with $N$ taxa occurs with equal proportions (e.g. with 20 taxa, maximum diversity occurs if each has a relative abundance of 5%, i.e. 0.05)- Lastly, you change the sign of the result to a positive number, for ease of interpretation (this just makes more intuitive sense: as higher positive numbers indicates higher diversity.):::We will also compute an exponentiated version, the effective number of genera, or effective Shannon index.::: {.callout-note collapse="true"}#### Explanation of the **Effective** Shannon DiversityThe exponent of the Shannon index $e^H$ represents the number of taxa (genera) that would be present in an evenly abundant ecosystem with the same Shannon index.- The numeric value of the Shannon index itself has no intuitive meaning.- You can compare them, but can't easily interpret any one number.- So, the concept of "effective numbers" of taxa is useful here.- If your original ecosystem was actually perfectly even, then $e^H = N$- Where N is the observed richness- The more uneven an ecosystem, the further $e^H$ will be from $N$:::```{r}ps_alpha <- ps_alpha %>%ps_calc_diversity(index ="shannon", rank ="Genus", varname ="Shannon_Genus") %>%ps_mutate(Effective_Shannon_Genus =exp(Shannon_Genus))ps_alpha```The new genus-level Shannon diversity variables are stored in the sample data slot of `ps_alpha````{r}ps_alpha %>%samdat_tbl() %>%select(sample, Shannon_Genus, Effective_Shannon_Genus)```### Diversity DistributionsFirst we will plot simple histograms of the richness and diversity values we observe.\Notice the different range of values for each one.::: panel-tabset##### Observed Richness```{r, fig.width=6, out.width="70%"}ps_alpha %>% samdat_tbl() %>% pull(N_genera) %>% hist(main = "Observed Richness, Genus")```##### Shannon $H$```{r, fig.width=6, out.width="70%"}ps_alpha %>% samdat_tbl() %>% pull(Shannon_Genus) %>% hist(main = "Shannon Diversity, Genus")```##### Effective Shannon $e^H$```{r, fig.width=6, out.width="70%"}ps_alpha %>% samdat_tbl() %>% pull(Effective_Shannon_Genus) %>% hist(main = "Effective Number of Genera")```:::### Interpreting diversity::: panel-tabset#### **Observed Richness:** Number of GeneraEach sample is sorted and labelled by the observed richness of generaCan you spot samples with equal richness but clear differences in evenness?```{r, message=FALSE, fig.height=11, fig.width=6, out.width="70%"}#| fig-align: "center"#| code-fold: trueps_alpha %>% ps_arrange(N_genera) %>% comp_barplot( tax_level = "Genus", n_taxa = 19, merge_other = FALSE, sample_order = "asis", label = "N_genera" ) + coord_flip()```Don't forget to look at the other tab, where the bars are labelled by their diversity!#### **Diversity:** Shannon Effective Number of GeneraNow each sample is labelled with the effective Shannon diversity of genera ($e^H$)Do you see the general relationship of $e^H$ with sample composition?```{r, message=FALSE, fig.height=11, fig.width=6, out.width="70%"}#| fig-align: "center"#| code-fold: trueps_alpha %>% ps_arrange(Effective_Shannon_Genus) %>% ps_mutate(short_shannon = formatC(Effective_Shannon_Genus, digits = 1, format = "f")) %>% comp_barplot( tax_level = "Genus", n_taxa = 19, merge_other = FALSE, sample_order = "asis", label = "short_shannon" ) + coord_flip()```:::## Save our phyloseqWe have assembled a phyloseq object and calculated richness and diversity measures.Let's store the result of this processing, by writing the phyloseq object to a file.We can use the `saveRDS()` function to do this.```{r}saveRDS(ps_alpha, file =here("data", "papa2012", "processed", "papa12_phyloseq.rds"))``````{r}#| include: falsedir.create(here("data", "papa2012", "processed", "backup"))saveRDS(ps_alpha, file =here("data", "papa2012", "processed", "backup", "papa12_phyloseq.rds"))```## Next! ⏩- Its time to stop exploring and take a break.- In the next lecture you will learn more about the main approaches for microbiota data analysis.- In the next practical session, we will come back to these data with a plan, an analysis plan!- Click here: <https://david-barnett.github.io/2024-Liege-microbiome/practical-2/web/practical2-instructions.html>## Session info<details>```{r}sessioninfo::session_info()```</details>